Clustering on Sliding Windows in Polylogarithmic Space

نویسندگان

  • Vladimir Braverman
  • Harry Lang
  • Keith Levin
  • Morteza Monemizadeh
چکیده

In PODS 2003, Babcock, Datar, Motwani and O’Callaghan [4] gave the first streaming solution for the k-median problem on sliding windows using O( k τ4W 2τ log2 W ) space, with a O(2O(1/τ)) approximation factor, where W is the window size and τ ∈ (0, 2 ) is a user-specified parameter. They left as an open question whether it is possible to improve this to polylogarithmic space. Despite much progress on clustering and sliding windows, this question has remained open for more than a decade. In this paper, we partially answer the main open question posed by Babcock, Datar, Motwani and O’Callaghan. We present an algorithm yielding an exponential improvement in space compared to the previous result given in Babcock, et al. In particular, we give the first polylogarithmic space (α, β)-approximation for metric k-median clustering in the sliding window model, where α and β are constants, under the assumption, also made by Babcock et al., that the optimal k-median cost on any given window is bounded by a polynomial in the window size. We justify this assumption by showing that when the cost is exponential in the window size, no sublinear space approximation is possible. Our main technical contribution is a simple but elegant extension of smooth functions as introduced by Braverman and Ostrovsky [9], which allows us to apply well-known techniques for solving problems in the sliding window model to functions that are not smooth, such as the k-median cost. 1998 ACM Subject Classification I.5.3 Clustering

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Unified Approach for Clustering Problems on Sliding Windows

We explore clustering problems in the streaming sliding window model in both general metric spaces and Euclidean space. We present the first polylogarithmic space O(1)-approximation to the metric kmedian and metric k-means problems in the sliding window model, answering the main open problem posed by Babcock, Datar, Motwani and O’Callaghan [5], which has remained unanswered for over a decade. O...

متن کامل

Sliding Windows with Limited Storage

We consider time-space tradeoffs for exactly computing frequency moments and order statistics over sliding windows [16]. Given an input of length 2n− 1, the task is to output the function of each window of length n, giving n outputs in total. Computations over sliding windows are related to direct sum problems except that inputs to instances almost completely overlap. • We show an average case ...

متن کامل

Clustering Problems on Sliding Windows

We explore clustering problems in the streaming sliding window model in both general metric spaces and Euclidean space. We present the first polylogarithmic space O(1)-approximation to the metric kmedian and metric k-means problems in the sliding window model, answering the main open problem posed by Babcock, Datar, Motwani and O’Callaghan [5], which has remained unanswered for over a decade. O...

متن کامل

Submodular Optimization Over Sliding Windows

Maximizing submodular functions under cardinality constraints lies at the core of numerous data mining and machine learning applications, including data diversification, data summarization, and coverage problems. In this work, we study this question in the context of data streams, where elements arrive one at a time, and we want to design lowmemory and fast update-time algorithms that maintain ...

متن کامل

A Novel Ensemble Approach for Anomaly Detection in Wireless Sensor Networks Using Time-overlapped Sliding Windows

One of the most important issues concerning the sensor data in the Wireless Sensor Networks (WSNs) is the unexpected data which are acquired from the sensors. Today, there are numerous approaches for detecting anomalies in the WSNs, most of which are based on machine learning methods. In this research, we present a heuristic method based on the concept of “ensemble of classifiers” of data minin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015